Threaded Multiple Path Execution Steven
نویسندگان
چکیده
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simultaneous Multithreading (SMT) processor to speculatively execute multiple paths of execution. When there are fewer threads in an SMT processor than hardware contexts, threaded multi-path execution uses spare contexts to fetch and execute code along the less likely path of hard-to-predict branches. This paper describes the hardware mechanisms needed to enable an SMT processor to efficiently spawn speculative threads for threaded multi-path execution. The Mapping Synchronization Bus is described, which enables the spawning of these multiple paths. Policies are examined for deciding which branches to fork, and for managing competition between primary and alternate path threads for critical resources. Our results show that TME increases the single program performance of an SMT with eight thread contexts by 14%-23% on average, depending on the misprediction penalty, for programs with a high misprediction rate.
منابع مشابه
Instruction Recycling on a Multiple-Path Processor
Processors that can simultaneously execute multiple paths of execution will only exacerbate the fetch bandwidth problem already plaguing conventional processors. On a multiple-path processor, which speculatively executes less likely paths of hard-to-predict branches, the work done along a speculative path is normally discarded if that path is found to be incorrect. Instead, it can be beneficial...
متن کاملIPC Control for Multiple Real-Time Threads on an In-Order SMT Processor
This paper proposes an architecture for concurrent scheduling of hard, soft and non real-time threads in embedded systems. It is based on a superscalar in-order processor binary compatible to the Infineon TriCore. The architecture allows a tight static WCET analysis of hard real-time threads. To provide high performance anyway, the absence of speculative elements like branch prediction and out-...
متن کاملA task parallel algorithm for finding all-pairs shortest paths using the GPU
This paper proposes an acceleration method for finding the all-pairs shortest paths (APSPs) using the graphics processing unit (GPU). Our method is based on Harish’s iterative algorithm that computes the cost of the single-source shortest path (SSSP) in parallel on the GPU. In addition to this fine-grained parallelism, we exploit the coarse-grained parallelism by using a task parallelization sc...
متن کاملParallelization Techniques with Improved Dependence Handling
Continuing exponential growth in transistor density and diminishing returns from the increasing transistor count have forced processor manufacturers to pack multiple processor cores onto a single chip. These processors, known as multi-core processors, generally do not improve the performance of single-threaded applications. Automatic parallelization has a key role to play in improving the perfo...
متن کاملData Threaded Query Evaluation in Shared - EverythingEnvironments
In this paper, we present data threaded execution, a new strategy to exploit pipelining and intra-operator parallelism in a shared-everything environment. Data threaded execution does neither suuer from execution skew caused by workload estimation errors, nor from the discretization error of processor scheduling as it appears in conventional strategies. Further more, data threaded execution avo...
متن کامل